Tiling Databases

نویسندگان

  • Floris Geerts
  • Bart Goethals
  • Taneli Mielikäinen
چکیده

In this paper, we consider 0/1 databases and provide an alternative way of extracting knowledge from such databases using tiles. A tile is a region in the database consisting solely of ones. The interestingness of a tile is measured by the number of ones it consists of, i.e., its area. We present an efficient method for extracting all tiles with area at least a given threshold. A collection of tiles constitutes a tiling. We regard tilings that have a large area and consist of a small number of tiles as appealing summaries of the large database. We analyze the computational complexity of several algorithmic tasks related to finding such tilings. We develop an approximation algorithm for finding tilings which approximates the optimal solution within reasonable factors. We present a preliminary experimental evaluation on real data sets.

منابع مشابه

Polygonal tiling of some surfaces containing fullerene molecules

A tiling of a surface is a decomposition of the surface into pieces, i.e. tiles, which cover it without gaps or overlaps. In this paper some special polygonal tiling of sphere, ellipsoid, cylinder, and torus as the most abundant shapes of fullerenes are investigated.

متن کامل

BSRD: a repository for bacterial small regulatory RNA

In bacteria, small regulatory non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators. They are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq, have allowed efficient detection and characterization of bacterial sRN...

متن کامل

Gene Prediction with Conditional Random Fields

Given a sequence of DNA nucleotide bases, the task of gene prediction is to find subsequences of bases that encode proteins. Reasonable performance on this task has been achieved using generatively trained sequence models, such as hidden Markov models. We propose instead the use of a discriminitively trained sequence model, the conditional random field (CRF). CRFs can naturally incorporate arbi...

متن کامل

BeetleBase in 2010: revisions to provide comprehensive genomic information for Tribolium castaneum

BeetleBase (http://www.beetlebase.org) has been updated to provide more comprehensive genomic information for the red flour beetle Tribolium castaneum. The database contains genomic sequence scaffolds mapped to 10 linkage groups (genome assembly release Tcas_3.0), genetic linkage maps, the official gene set, Reference Sequences from NCBI (RefSeq), predicted gene models, ESTs and whole-genome ti...

متن کامل

Ranked Tiling

Tiling is a well-known pattern mining technique. Traditionally, it discovers large areas of ones in binary databases or matrices, where an area is defined by a set of rows and a set of columns. In this paper, we introduce the novel problem of ranked tiling, which is concerned with finding interesting areas in ranked data. In this data, each transaction defines a complete ranking of the columns....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004